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Abstract 

Regular  EX-forests  continue  to  play  an  important  role  in  program- 
ming languages,  specifically  in  the  design  of  type  systems  [MiR85, 
AM91,  Vol93].  They  arise  naturally  as  terms  of  constructor-based,  re- 
cursive data  types  in  logic  and  functional  languages.  Deciding  whether 
the  intersection  of  a  sequence  of  regular  SX-forests  is  nonempty  is  an 
important  problem  in  type  inference.  We  show  that  this  problem  is 
PSPACE-hard  and  as  a  corollary  that  the  problem  of  constructing  a 
regular  EX-grammar  representing  their  intersection  is  PSPACE-hard. 

1      Introduction 

Regular  EX-forests  are  playing  an  increasingly  important  role  in  language 
design  and  in  particular  in  the  design  of  type  systems.  Type  inference  then 
usually  relies  upon  various  operations  over  regular  forests,  one  of  which  is 
RF-INT  ,  deciding  the  emptiness  of  their  intersection. 

Definition  1.1  The  problem  RF-INT  is  given  a  sequence  of  regular  Y,X- 
grammars  G1} . . . ,  Gm,  decide  whether  fl^Li  T{Gk)  is  nonempty. 


Regular  forests  have  been  used  to  characterize  the  types  of  logic  and  func- 
tional programs  [Mis84,  MiR85,  HeJ90,  AM91]  as  well  as  overbadings  intro- 
duced through  classes  in  Haskell  [Kae88,  Vol93].  For  example,  Heintze  and 
JafFar  propose  what  amounts  to  regular  EJV-grammars  as  inferred  "types" 
or  approximations  of  the  semantics  of  logic  programs.  Corresponding  to  a 
logic  program,  say 

p{a). 

p(f(X))  -  p(X). 

r(b). 

r(f(Y))  -  r(Y). 

q(Z)  «-  p(Z),r(Z). 

is  a  set  of  equations 

X  =  aU  f{X) 

Y  =  bUf(Y) 

z  =  xdy 

whose  simultaneous  least  fixed  point  is  an  approximate  meaning  of  the  pro- 
gram. The  inferred  approximation  or  "type"  is  given  by 

X  =  aU  f(X) 

Y  =  bUf(Y) 
Z  =  0 

Solving  for  variable  Z  requires  deciding  whether  the  intersection  of  the  two 
regular  forests  described  by  the  first  two  equations  is  nonempty. 

One  can  also  view  the  logic  program  above  as  describing  a  set  of  valid 
overloadings  in  Haskell  for  p  and  r  as  operators  where  p  has  instances  at 
types  a  and  /,  and  r  at  b  and  /: 

class  P  a  where  p ::  a 

instance  P  a  where  p  =  . . . 

instance  PA'  =>  P  f(X)  where  p  =.. . 

class  R  a  where  r  ::a 

instance  R  b  where  r  =  . . . 

instance  R  Y  =$>  Rf(Y)  where  r  =  .. . 

Instance  declarations  for  an  overloaded  operator  in  Haskell  describe  a  regular 
forest.    So  for  example,  deciding  whether  term  p  =  r  is  typable  requires 


deciding  whether  the  regular  forest  arising  from  p's  instance  declarations 
intersects  with  the  forest  described  by  instances  for  r. 

2     Forests  and  Regular  EX-grammars 

Given  an  alphabet  A,  an  A- valued  tree  t  is  specified  by  its  set  of  nodes  (the 
"domain"  dom(t))  and  a  valuation  of  the  nodes  in  A.  Formally,  a  k-ary, 
,4- valued  tree  is  a  map  t  :  dom(t)  — >  A  where  dom(t)  C  {0, . .  .,k  —  1}*  is  a 
nonempty  set,  closed  under  prefixes.  The  frontier  of  t  is  the  set 

{w  €  dom(t)  |  -<3i.wi  6  dom(t)}. 

It  is  assumed  that  A  is  partitioned  into  a  ranked  alphabet  E  and  a  frontier 
alphabet  X .  A  ranked  alphabet,  or  signature,  is  a  finite  nonempty  operator 
domain.  For  any  E  and  X,  we  denote  the  set  of  all  finite  EX-trees  by  F^(X). 
A  forest,  or  tree  language,  T  C  F%(X)  is  called  regular  if  and  only  if  for  some 
finite  set  C  disjoint  from  E  and  X,  T  can  be  obtained  from  finite  subsets 
of  F%(X  U  C)  by  applications  of  union,  concatenation  -c  (defined  using  tree 
substitution),  and  closure  *c  where  c  £  C  [Tho90]. 

A  regular  forest  can  alternatively  be  defined  as  a  tree  language  generated 
by  a  regular  EX-grammar  [GeS84]. 

Definition  2.1  A  regular  HX-grammar  G  consists  of 

•  a  finite  nonempty  set  N  of  nonterminal  symbols, 

•  a  finite  set  P  of  productions  of  the  form  A  — >  r  where  A  £  N  and 

r  e  Fx(N\JX),  and 

•  an  initial  symbol  S  £  N . 

Definition  2.2  If  G  =  (N,H,X,P,S)  is  a  regular  T,X -grammar  then  the 
T,X -forest  generated  by  G  is 

T(G)  =  {te  FL(X)  \S=**Gt} 

Regular  EX-grammars  are  a  class  of  context-free  grammars  that  define 
the  same  family  of  forests  as  those  recognized  by  nondeterministic  root-to- 
frontier  (NDR)  EX-automata.  A  root-to-frontier  automaton  can  be  viewed 


as  an  attribute  evaluator  for  a  tree  whose  attributes  are  states  prescribed 
by  an  attribute  grammar  with  inherited  attributes  only.  Formally,  a  NDR 
EX-automaton  A  is  a  tuple  (.4,  A',  a)  such  that 

1.  A  is  a  finite  NDR  E-algebra  (A,  E), 

2.  A'  C  A  is  a  set  of  initial  states,  and 

3.  a  :  X  — *  pA  is  a  final  assignment. 

In  a  NDR  E-algebra  (A,  E),  A  is  a  nonempty  set  of  states  and  every 
a  £  Em  with  m  >  1  is  realized  as  a  mapping  oA  :  A  — >  p(Am).  For  a  £  Eo, 
a     is  a  subset  of  A. 

For  example,  a  NDR  EX-automaton  A  =  (.4,  A',  a)  recognizing  set 

Mar, j/),  cr{y,x)} 

can  be  defined  as  follows.  Let  E  =  E2  =  {&},  X  =  {x,y},  and  the  set  of 
initial  states  A'  =  {S}.  Define  A=  ({x,y,S},E)  such  that 

<rA(S)  =  {(x,y),(y,x)} 

and  finally  define  the  final  assignment  a  as 

xa  =  {y} 
ya  =  {x} 

It  is  interesting  to  note  that  there  is  no  deterministic  root-to-frontier  TiX- 
automaton  that  accepts  the  set  above.  Suppose  automaton  A  accepts  a(x,  y) 
and  cr(y,x)  and  that  <r(a)  =  (01,02)  f°r  some  states  a,  01,  and  a2  of  A.  If  a 
is  A's  final  assignment  function,  then 

xa  =  ai,    ya  =  a2?    y0  =  a\i    XOc  —  a2 

Since  A  is  deterministic,  a\  =  a2.  So  we  have  0(a)  =  (ai,a\)  where  xa  = 
ya  =  a\.  Therefore  on  cr(ar,ar)  and  a(y,y),  A  enters  the  leaves  in  state  a1 
such  that  ax  £  xa,  and  a1  £  ya.  Thus  A  accepts  a(x,x)  and  a(y,y)  as  well. 

Given  that  regular  EX-grammars  define  exactly  the  forests  recognized  by 
NDR  EX-automata,  one  could  formulate  RF-INT  in  terms  of  the  latter  rep- 
resentation of  regular  forests.  But  we  choose  regular  EX-grammars  instead 
since  they  are  better  suited  for  manipulation. 

Regular  forests  are  effectively  closed  under  intersection. 


Theorem  2.1  If  G\  and  G2  are  regular  T,X -grammars,  for  a  given  E  and 
X ,  then  T(G\)  fl  T(G2)  is  a  forest  generated  by  a  regular  Y>X -grammar. 

Proof.  Suppose  G\  =  (iVi,  E,  X,  Pi,  5i)  and  G2  =  (N2,  E,  X,  P2,  S2)  are  regu- 
lar EX-grammars.  Let  EX-grammar  G  =  (N\  x  N2,  E,  X,  P,  [Si,  S2})  where 

[A,B]->a([yi,Z1],...,[yn,Zn])€P,    for   n>0 

if  and  only  if 

A->a(y1}...,yn)ePi, 

B-^a(Z1,...,Zn)eP2) 

and  a  6  E,  or  [A,  5]  -►  a  E  P  if  and  only  if  a  £  X.  Then  T(G)  = 
TiG^HTiG^.      D 

The  theorem  implies  that  the  family  of  regular  forests  is  properly  con- 
tained within  the  context-free  languages  since  the  latter  is  not  closed  under 
intersection. 

We  now  state  and  prove  the  main  result. 

Theorem  2.2  RF-INT  is  PSPACE-hard. 

Proof.  The  proof  uses  a  result  of  [Koz77].  For  every  deterministic  Turing  ma- 
chine M  of  polynomial  space  complexity,  we  give  a  log-space  transducer  that 
on  input  x,  outputs  a  sequence  of  regular  EX-grammars  whose  intersection 
is  nonempty  iff  M  accepts  x. 

Let  M  be  a  single  tape  DTM  of  polynomial  space  complexity  p(n)  >  n 
and  assume  that  M  always  makes  at  least  three  odd  number  of  moves,  has  a 
unique  accepting  state,  qacc,  and  erases  its  tape  before  accepting,  positioning 
its  tape  head  at  the  left  end  of  the  tape.  Let  x  =  ax . . .  an  be  a  string  over 
M's  input  alphabet  and  suppose  M  has  states  Q  and  tape  symbols  T  such 
that  Q,  T,  and  set  {nil,#,##}  are  pairwise  disjoint.  If 

A  =  Tu{[qx]  \qeQ  k  x  er} 

then  ranked  alphabet  E  =  Eo  U  Ei  U  E2  U  E3  where  Eo  =  {nil},  Ei  =  A, 
E2  =  {##}  and  E3  =  {#}.  Suppose  ID&  derives  regular  forest 

Zi  (Z2  (•  •  •  Zp(n)  (nil)  •  •  •) 


for  all  Zk  £  A,  1  <  k  <  p(n),  and  ID\  1   2       derives  regular  forest 

Z1  (•  •  •  Zi.j  (X,  (X2  (X3  (Zi  (•  •  •  Zp(n)_3  (nil)  •  ■  •) 

for  all  Xi ,  X2,X3,  Zk  £  A,  1  <  A;  <  p(n)  —  3. 

A  computation  of  M  consists  of  a  sequence  of  instantaneous  descriptions 
IDq  h  ID\  h  •  •  •  h  ID2m+\,  each  containing  the  contents  of  M's  tape  padded 
with  blanks  (B's)  to  length  p(n).  If  according  to  a  move  of  M,  symbols 
y^2 y$  in  positions  i,  i  +  1,  and  z  +  2  respectively  of  an  /Z)  can  follow  from 
symbols  X\X2X$  in  the  same  positions  of  another  ID,  we  write 

ID\x,x2x3)  ^^  ID[YiY2Y3) 

We  give  two  regular  EX-grammars  F°dd  and  Ffven  such  that  F°dd  ensures 
that  even  ID's  follow  from  odd  ones,  and  F*ven  that  odd  ones  follow  from 
even  ones.  Let  F°dd  be  a  regular  EA'-grammar  with  empty  frontier  alphabet, 
start  symbol  S  and  productions 

s  ->  #{idajd[ZiZ2Z3\f}ZiZ2Z3]) 

for  all  Zk  £  A,  1  <  k  <  3, 

p[XiX2X3)  ^    m/  tt-AYiY2Yt,]     Tp[ZiZ2Z3]    p[ZiZ2Z3]^ 

for  all  Xk,Yk,Zk  €  A,  1  <  A:  <  3,  such  that  W[XlX2X3]  \-M  ID[YlY2Y3] ,  and 
Fp^2^3]  ^  ##(ID1*Y2Y3\IDA) 

for  all  AT*,  Vfc  €  A,  1  <  k  <  3,  such  that  ID[*lX2X3]  hM  /£>[yir2y31. 

Let  Ffven  be  a  regular  EX-grammar  with  empty  frontier  alphabet,  start 
symbol  S  and  productions 

S  -♦  #{ID[XlX2X3\ IdYiY2Yz\S) 

s  -*  ##(id[XiX2X3\id[YiY2Y3]) 

for  all  Xfc,n  €  A,  1  <  A;  <  3,  such  that  W[XlX2X3]  \-M  ID[YlY2Y3] . 
Finally,  suppose  initID  derives  the  unary  tree 

[q0al](a2{-  ■  ■  an(Bn+1(-  ■  •  Bp{n)(nil)  •  •  •) 


where  B^  is  a  blank  and  q0  is  the  start  state  of  M,  and  finallD  derives 

[qaccB](B2(- •  ■  Bp{n)(nil)  ■  ■ .) 
Then  let  Fena<  be  a  regular  grammar  with  start  symbol  S  and  productions 
S->#(initIDJDA,Face) 

Face  — *  #(/I?A,  ID&,  Facc) 

Facc-+##(IDA,finalID) 
Then  we  have 

p(n)-2 
t=l 

iff  i/  =  #(/£>„,  IDlt #(•  •  •  #(//?2m-2,  ID2m-u##(ID2m,  ID2m+1)  ■  ■  •)  and  from 
IDik-\  follows  7i?2fc  according  to  the  transition  rules  of  M  for  1  <  A:  <  m. 
Likewise, 

p(n)-2 

uE     f|    T(F,et;en) 
t=i 

iff  u  =  #(/D0,/Z)1,#(...#(/D2m.2,  /Z)2m_l5  ##(/Z>2m,/£>2m+1).-.)  and  from 
7Z)2/t  follows  ID2k+i  according  to  the  rules  of  M  for  0  <  A:  <  ra.  Then 

p(n)-2 

T(Fend)n   f|  r(^d)nr(Fr) 
»=i 

is  nonempty  iff  M  accepts  x.      D 

As  is  the  case  for  emptiness  of  intersection  of  a  sequence  of  DFA's,  the 
source  for  the  hardness  of  RF-INT  lies  not  in  deciding  emptiness  but  rather 
in  computing  the  intersection  of  regular  forests. 

Corollary  2.3  Given  regular  Y,X -grammars  C?i, . . . ,  Gm,  constructing  a  reg- 
ular EX -grammar  G  such  that  T(G)  =  f)T=i  T{Gk)  is  PSPACE-hard. 

Proof.  The  emptiness  of  T(G)  for  a  regular  EX-grammar  G  is  decidable 
in  time  0{\  G  |2)  in  the  usual  way.  From  the  proof  of  Theorem  2.2  then 
every  problem  in  PSPACE  is  P-time  Turing  reducible  to  the  problem  of 
constructing  the  intersection  of  a  sequence  of  regular  EX-grammars.      □ 


A  simple  algorithm  for  constructing  G  is  based  on  the  usual  construction 
of  forming  the  cartesian  product  of  reachable  states  as  is  suggested  in  the 
proof  of  Theorem  2.1  [AiM91].  It  has  worst-case  time  complexity  exponential 
in  m.  Unfortunately  this  naive  construction  is  likely  the  best  we  can  do.  It 
should  be  pointed  out  that  for  a  fixed  m,  constructing  G  from  G?i,.. . ,  Gm 
can  be  done  in  polynomial  time. 

Deciding  whether  some  number  of  DFA's  accept  a  common  string  can  be 
done  in  nondeterministic  linear  space,  but  this  does  not  appear  to  be  true 
for  RF-INT,  which  can  be  decided  in  deterministic  exponential  time.  This 
suggests  that  a  tighter  lower  bound  exists  for  RF-INT. 
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